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WHAT IS CLAIMED IS : 

1. A method for at least one of genotyping and haplotyping a sequence of 
polymorphic genetic loci in a deoxyribonucleic acid (DNA) sample or 
identifying a strain variant from the DNA sample, comprising: 

5 i) providing one or more microarrays that include a set of oligonucleotide 

probes that are capable of detecting the at least one of the genotypes 
and the haplotypes or the strain variant; 
ii) hybridizing the DNA sample to the one or more microarrays to create a 
hybridization pattern; and 
10 iii) determining at least one of a genotype and a haplotype or a strain 

variant based on the hybridization pattern. 

2. The method of claim 1, wherein the one or more microarrays include a set of 
oligonucleotide probes that are capable of detecting at least one of all known 
genotypes and all known haplotypes at the polymorphic genetic loci or the 

1 5 strain identification. 

3. The method of claim 1, wherein the one or more microarrays are configured to 
include at least one of an optimal set and an optimal arrangement of 
oligonucleotide probes. 

4. The method of claim 3, further comprising optimizing the least one of the set 
20 and arrangement of oligonucleotides based on the following: 

min £> y ] 

type) 

<=> min^w, Pxfcj^fj). 



typej 



wherein Tj is a true allele contained in the DNA sample, fj is the allele 

fl, if X is true] 

determined by the hybridization step, EL = \ . > , and w. is a 

[0, otherwise J 

weight assigned to at least one of the genotype and the haplotype j. 
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5. The method of claim 4, wherein the weights are provided as follows: 
Wj = 1 \/j , wherein \/ y is a set of at least one of all known genotypes and all 
known haplotypes at one or more predetermined polymorphic genetic loci. 

6. The method of claim 4, wherein the weights are provided as follows: w y is 
5 different for each genotype or haplotype. 

7. The method of claim 4, wherein step (iii) produces a vector of n 
measurements, wherein n is a number of probes contained on the one or more 
microarrays. 

8. The method of claim 7, wherein the n potential probes provided to identify N 
10 known genotypes or haplotypes are each associated with a response vector 

^e{0,lf,7=l,...,n. 

9. The method of claim 8, further comprising generating a graph G on vertices 
corresponding to probe response vectors. 

10. The method of claim 9, wherein the graph G is a complete edge-weighted and 
1 5 vertex-weighted undirected graph G = (V,E) provided on n vertices, wherein n 

is the number of potential probes. 

1 1 . The method of claim 1 0, wherein the weights w of each vertex v and each edge 
e are constrained by: 0 < w(v), w(e) < 1 . 

12. The method of claim 1 1, wherein the weight w of a vertex v is set to: 
20 w(v) = min{fraction of 0's, fraction of 1 *s}. 

13. The method of claim 11, wherein the weight w of an edge e ={u,v} is set to: 

w(e) = Hamming distance/vector length, 

wherein Hamming distance is measured between the probe response vectors 
corresponding to vertices u and v, and vector length is the length of said probe 
25 response vectors, namely, N. 

14. The method of claim 10, further comprising modifying the graph G by 
thresholding the edges such that the modified graph G m0 d is defined as G mo d = 
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(V, E m odl wherein E mod = {e e E\ w(e) < p}, and p is a selected threshold 
value. 

15. The method of claim 14, wherein, for the modified graph G mod and the probe 
set size M, the following is performed: 

5 i) initializing a current-best list of independent sets with 

associated information weights, 

ii) initializing vertex boosting weights to vertex weights w(v), 

iii) defining a probability distribution on the vertex subset based on 
vertex boosting weights, 

10 iv) choosing a random subset of vertices of a specified size M 

based on the probability distribution, 

v) eliminating one of the end-point vertices in each of the edges 
remaining in the induced subgraph on the random subset, 

vi) modifying the vertex boosting weights by increasing the 
15 weights of the vertices that are retained in the subset and 

decreasing the weights of the vertices that were selected in step 
(iv) but eliminated in step (v), and 

vii) repeating steps (iii) through (vi) for at least one of a 
predetermined number of iterations and until no improvement 

20 to the list of top independent sets is achieved. 

16. The method of claim 15, wherein, for the modified graph G m0( t and the probe 
set size M, steps (ii) through (vii) are repeated for a predetermined number of 
iterations, each iteration starting with reinitializing vertex boosting weights to 
vertex weights w(v) in step (ii). 

25 17. The method of claim 16, wherein, for a given fixed small 0<e«l, the probe 
set size M satisfies an inequality Pr(Vcode pairs, Hamming distance > 1) > 1- 

6. 
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18. The method of claim 16, wherein, for a given fixed small 0<t=«l and a fixed 
cc>l, the probe set size M satisfies an inequality Fr(Vcode pairs, Hamming 
distanced a) > 1-e. 

19. The method of claim 15, wherein the threshold p has a value to enable the 
5 graph G to have a sparsity bounded by A < sparsity < B 9 wherein the sparsity is 

definable by the average degree of a vertex in the graph G. 

20. The method of claim 19, wherein the lower bound A is a relatively small 
constant, and the upper bound B is a function of the number of vertices n. 

21. A software arrangement which, when executed on a processing device, 
1 0 configures the processing device to perform the steps comprising: 

i) hybridizing the DNA sample to one or more microarrays to create a 
hybridization pattern, the one or more microarrays including a set of 
oligonucleotide probes that are capable of detecting at least one set of 
genotypes and haplotypes for a sequence of polymorphic genetic loci 

15 in a deoxyribonucleic acid (DNA) sample or identifying a strain 

variant from the DNA sample; and 

ii) determining at least one of a genotype and a haplotype or a strain 
variant based on the hybridization pattern. 

22. The software arrangement of claim 21, wherein the one or more microarrays 
20 include a set of oligonucleotide probes that are capable of detecting at least 

one of all known genotypes and all known haplotypes at the polymorphic 
genetic loci or strain variation. 

23. The software arrangement of claim 21, wherein the one or more microarrays 
are configured to include at least one of an optimal set and an optimal 

25 arrangement of oligonucleotide probes. 

24. The software arrangement of claim 23, wherein the processing device is 
further configured to optimize the least one of the set and arrangement of 
oligonucleotides based on the following: 
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min J> y E^l Tj9f } 

typej 

min Pr(r y *fj). 



typej 



wherein Tj is a true allele contained in the DNA sample, fj is the allele 

fl, if X is true] 

determined by the hybridization step, 0 ^ erw i se r W J 1S a 

weight assigned to at least one of the genotype and the haplotypey. 

5 25. The software arrangement of claim 24, wherein the weights are provided as 
follows: Wj = l\/., wherein \f. is a set of at least one of all known 
genotypes and all known haplotypes at one or more predetermined 
polymorphic genetic loci. 

26. The software arrangement of claim 24, wherein the weights are provided as 
10 follows: Wj is different for each genotype or haplotype. 

27. The software arrangement of claim 26, wherein step (i) produces a vector of n 
measurements, wherein n is a number of probes contained on the microarray. 

28. The software arrangement of claim 26, wherein the n potential probes 
provided to identify N known genotypes or haplotypes are each associated 

1 5 with a response vector Dj e {0,l}", j = 1,...,//. 

29. The software arrangement of claim 28, wherein the processing device is 
further configured to generate a graph G on vertices corresponding to probe 
response vectors. 

30. The software arrangement of claim 29, wherein the graph G is a complete 
20 edge-weighted and vertex-weighted undirected graph G = (V,E) provided on n 

vertices, wherein n is the number of potential probes. 

3 1 . The software arrangement of claim 30, wherein the weights w of each vertex v 
and each edge e are constrained by: 0 < w(v), w(e) < 1 . 
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32. The software arrangement of claim 31, wherein the weight w of a vertex v is 
set to: 

w(y) = min{fraction of 0's, fraction of l's}/100. 

33. The software arrangement of claim 31, wherein the weight w of an edge e 
5 ={w,v} is setto: 

w(e) = Hamming distance/vector length, 

wherein Hamming distance is measured between the probe response vectors 
corresponding to vertices u and v, and vector length is the length of said probe 
response vectors, namely, N. 

10 34. The software arrangement of claim 30, wherein the processing device is 
further configured to modify the graph G by thresholding the edges such that 
the modified graph G mod is defined as G mo d = (V, E mod ), wherein E mod = {e e 
E: w(e) < p} , and p is a selected threshold value. 

35. The software arrangement of claim 34, wherein, for the modified graph G m0 d 
15 and the probe set size M, the following is performed: 

i) initializing a current-best list of independent sets with 
associated information weights, 

ii) initializing vertex boosting weights to vertex weights w(v), 

iii) defining a probability distribution on the vertex subset based on 
20 vertex boosting weights, 

iv) choosing a random subset of vertices of a specified size M 
based on the probability distribution, 

v) eliminating one of the end-point vertices in each of the edges 
remaining in the induced subgraph on the random subset, 

25 vi) modifying the vertex boosting weights by increasing the 

weights of the vertices that are retained in the subset and 
decreasing the weights of the vertices that were selected in step 
(iv) but eliminated in step (v), and 
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vii) repeating steps (iii) through (vi) for at least one of a 
predetermined number of iterations and until no improvement 
to the list of top independent sets is achieved. 

36. The software arrangement of claim 35, wherein, for the modified graph G m0 d 
5 and the probe set size M> steps (ii) through (vii) are repeated for a 

predetermined number of iterations, each iteration starting with reinitializing 
vertex boosting weights to vertex weights w(v) in step (ii). 

37. The software arrangement of claim 36, wherein, for a given fixed small 
0<s«l, the probe set size M satisfies an inequality Pr(Vcode pairs, 

1 0 Hamming distance £ 1) > 1-e . 

38. The software arrangement of claim 36, wherein, for a given fixed small 
0<e«l and a fixed a>l, the probe set size M satisfies an inequality 
Pr(Vcode pairs, Hamming distance > a) > 1-e . 

39. The software arrangement of claim 36, wherein the threshold p has a value to 
1 5 enable the graph G to have a sparsity bounded by A < sparsity < B, wherein the 

sparsity is definable by the average degree of a vertex in the graph G. 

40. The software arrangement of claim 39, wherein the lower bound A is a 
relatively small constant, and the upper bound B is a function of the number of 
vertices n. 

20 41. A storage medium which includes thereon a software arrangement for 
providing one or more microarrays, which is capable of configuring a 
processing arrangement to perform the steps comprising: 

i) receiving information regarding a hybridization of the DNA sample to 
one or more microarrays to create a hybridization pattern, the one or 
25 more microarrays including a set of oligonucleotide probes that are 

capable of detecting at least one set of genotypes and haplotypes for a 
sequence of polymorphic genetic loci in a deoxyribonucleic acid 
(DNA) sample or identifying a strain variant from the DNA sample; 
and 
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ii) determining at least one of a genotype and a haplotype or a strain 
variant based on the hybridization pattern. 

42. The storage medium of claim 41 , wherein the one or more microarrays include 
a set of oligonucleotide probes that are capable of detecting at least one of all 

5 known genotypes and all known haplotypes at the polymorphic genetic loci or 

strain variation. 

43. The storage medium of claim 42, wherein the one or more microarrays are 
configured to include at least one of an optimal set and an optimal 
arrangement of oligonucleotide probes. 

10 44. The storage medium of claim 43, wherein the processing device is further 
configured to optimize the least one of the set and arrangement of 
oligonucleotides based on the following: 

typej 

o minJX Pr(r,*7\). 



typej 



. wherein 7} is a true allele contained in the DNA sample, 3} is the allele 

|X ifXistruel 

15 determined by the hybridization step, IL = i . V, and w, is a 

J J [0, otherwise J J 

weight assigned to at least one of the genotype and the haplotype j. 

45. The storage medium of claim 44, wherein the weights are provided as follows: 
Wj = 1 Vy ' wherein \f . is a set of at least one of all known genotypes and all 
known haplotypes at one or more predetermined polymorphic genetic loci. 

20 46. The storage medium of claim 44, wherein the weights are provided as follows: 
Wj is different for each genotype or haplotype. 

47. The storage medium of claim 44, wherein step (i) produces a vector of n 
measurements, wherein n is a number of probes contained on the microarray. 
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48. The storage medium of claim 46, wherein the n potential probes provided to 
identify N known genotypes or haplotypes are each associated with a response 

vector Dje{0 9 l} N ,j =1,...,». 

49. The storage medium of claim 48, wherein the processing device is further 
5 configured to generate a graph G on vertices corresponding to probe response 

vectors, 

50. The storage medium of claim 49, wherein the graph G is a complete edge- 
weighted and vertex-weighted undirected graph G = (V, E) provided on n 
vertices, wherein n is the number of potential probes. 

10 51. The storage medium of claim 50, wherein the weights w of each vertex v and 
each edge e are constrained by: 0 < w(v\ w{e) < 1 . 

52. The storage medium of claim 51, wherein the weight w of a vertex v is set to: 

w(v) = min{fraction of 0's, fraction of 1 's}. 

53. The storage medium of claim 52, wherein the weight w of an edge e ={w,v} is 
15 set to: 

w(e) = Hamming distance/vector length, 

wherein Hamming distance is measured between the probe response vectors 
corresponding to vertices u and v, and vector length is the length of said probe 
response vectors, namely, N. 

20 54. The storage medium of claim 53, wherein the processing device is further 
configured to modify the graph G by thresholding the edges such that the 
modified graph G mo d is defined as G mo d = {V, E mod ), wherein E mod = {e e E: 
w(e) < p} y and p is a selected threshold value. 

55. The storage medium of claim 54, wherein, for the modified graph G m0 d and the 
25 probe set size M 9 the following is performed: 

i) initializing a current-best list of independent sets with 
associated information weights, 
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ii) initializing vertex boosting weights to vertex weights w(v\ 

iii) defining a probability distribution on the vertex subset based on 
vertex boosting weights, 

iv) choosing a random subset of vertices of a specified size M 
5 based on the probability distribution, 

v) eliminating one of the end-point vertices in each of the edges 
remaining in the induced subgraph on the random subset, 

vi) modifying the vertex boosting weights by increasing the 
weights of the vertices that are retained in the subset and 

10 decreasing the weights of the vertices that were selected in step 

(iv) but eliminated in step (v), and 

vii) repeating steps (iii) through (vi) for at least one of a 
predetermined number of iterations and until no improvement 
to the list of top independent sets is achieved. 

15 56. The storage medium of claim 54, wherein, for the modified graph G m0 d and the 
probe set size M 9 steps (ii) through (vii) are repeated for a predetermined 
number of iterations, each iteration starting with reinitializing vertex boosting 
weights to vertex weights w(v) in step (ii). 

57. The storage medium of claim 54, wherein, for a given fixed small 0<e«l, 
20 the probe set size M satisfies an inequality Pr(Vcode pairs, Hamming distance 

>l)>l-e. 

58. The storage medium of claim 54, wherein, for a given fixed small 0<e«l 
and a fixed ot>l, the probe set size M satisfies an inequality Pr(Vcode pairs, 
Hamming distance > a) > 1-e . 

25 59. The storage medium of claim 54, wherein the threshold p has a value to enable 
the graph G to have a sparsity bounded by A < sparsity < 5, wherein the 
sparsity is definable by the average degree of a vertex in the graph G. 
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10 



15 



20 



60. The storage medium of claim 59, wherein the lower bound A is a relatively 
small constant, and the upper bound B is a function of the number of vertices 



61. A system for at least one of genotyping and haplo typing polymorphic genetic 
loci or strain identification in a deoxyribonucleic acid (DNA) sample, 
comprising: 

a processing arrangement which is capable of being programmed to: 

i) receive information regarding a hybridization of the DNA sample to 
one or more microarrays to create a hybridization pattern, the one or 
more microarrays including a set of oligonucleotide probes that are 
capable of detecting at least one set of genotypes and haplotypes for a 
sequence of polymorphic genetic loci in a deoxyribonucleic acid 
(DNA) sample or identifying a strain variant from the DNA sample; 
and 

ii) determine at least one of a genotype and a haplotype or a strain variant 
based on the hybridization pattern. 

62. The system of claim 61, wherein the one or more microarrays include a set of 
oligonucleotide probes that are capable of detecting at least one of all known 
genotypes and all known haplotypes at the polymorphic genetic loci or strain 
variation. 

63. The system of claim 61, wherein the one or more microarrays are configured 
to include at least one of an optimal set and an optimal arrangement of 
oligonucleotide probes. 

64. The system of claim 63, wherein the processing arrangement is further 
programmed to optimize the least one of the set and arrangement of 
oligonucleotides based on the following: 




typej 
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wherein 7} is a true allele contained in the DNA sample, fj is the allele 

if Xistruel 
otherwise J ' 

weight assigned to at least one of the genotype and the haplotypey. 



determined by the hybridization step, rL = |^ ^Zll^Zl} 9 mi W J is a 



65. The system of claim 64, wherein the weights are provided as follows: 
5 Wj = l\/ j9 wherein \/ J is a set of at least one of all known genotypes and all 

known haplotypes at one or more predetermined polymorphic genetic loci. 

66. The system of claim 64, wherein the weights are provided as follows: Wj is 
different for each genotype or haplotype. 

67. The system of claim 64, wherein step (i) produces a vector of n measurements, 
10 wherein n is a number of probes contained on the one or more microarrays. 

68. The system of claim 66, wherein the n potential probes provided to identify N 
known genotypes or haplotypes are each associated with a response vector 

Uye{0,l}^y' = l 5 ...,H. 

69. The system of claim 68, wherein the processing arrangement is further 
15 programmed to generate a graph G on vertices corresponding to probe 

. response vectors. 

70. The system of claim 69, wherein the graph G is a complete edge-weighted and 
vertex-weighted undirected graph G = (F, E) provided on n vertices, wherein n 
is the number of potential probes. 

20 71 . The system of claim 70, wherein the weights w of each vertex v and each edge 
e are constrained by: 0 < w(v\ w(e) < 1 . 

72. The system of claim 71, wherein the weight w of a vertex v is set to: 

w(y) = min{fraction of 0's, fraction of l's} . 

73. The system of claim 71, wherein the weight w of an edge e ={w,v} is set to: 
25 M e ) = Hamming distance/vector length, 
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wherein Hamming distance is measured between the probe response vectors 
corresponding to vertices u and v, and vector length is the length of said probe 
response vectors, namely, N. 

74. The system of claim 70, wherein the processing arrangement is further 
programmed to modify the graph G by thresholding the edges such that the 
modified graph G mo d is defined as G mo d = (V, E mod ), wherein E m0 d = {e e E: 
w(e) < p) , and p is a selected threshold value. 

75. The system of claim 74, wherein, for the modified graph G mo d and the probe 
set size M 9 the following is performed: 

i) initializing a current-best list of independent sets with 
associated information weights, 

ii) initializing vertex boosting weights to vertex weights w(v), 

iii) defining a probability distribution on the vertex subset based on 
vertex boosting weights, 

iv) choosing a random subset of vertices of a specified size M 
based on the probability distribution, 

v) eliminating one of the end-point vertices in each of the edges 
remaining in the induced subgraph on the random subset, 

vi) modifying the vertex boosting weights by increasing the 
weights of the vertices that are retained in the subset and 
decreasing the weights of the vertices that were selected in step 
(iv) but eliminated in step (v), and 

vii) repeating steps (iii) through (vi) for at least one of a 
predetermined number of iterations and until no improvement 
to the list of top independent sets is achieved. 

76. The system of claim 75, wherein, for the modified graph G mo d and the probe 
set size M, steps (ii) through (vii) are repeated for a predetermined number of 
iterations, each iteration starting with reinitializing vertex boosting weights to 
vertex weights w(v) in step (ii). 
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77. The system of claim 76, wherein, for a given fixed small 0<e«l, the probe 
set size M satisfies an inequality Pr(Vcode pairs, Hamming distance > 1) > 1- 
e. 

78. The system of claim 76, wherein, for a given fixed small 0<€«1 and a fixed 
5 a>l, the probe set size M satisfies an inequality Pr(Vcode pairs, Hamming 

distanced a) > 

79. The system of claim 76, wherein the threshold p has a value to enable the 
graph G to have a sparsity bounded by A < sparsity < B, wherein the sparsity is 
definable by the average degree of a vertex in the graph G. 

10 80. The system of claim 79, wherein the lower bound A is a relatively small 
constant, and the upper bound B is a function of the number of vertices n. 
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